Time-Constrained Sequential Pattern Mining

نویسنده

  • Ming-Yen Lin
چکیده

Sequential pattern mining is one of the important issues in the research of data mining (Agrawal & Srikant, 1995; Ayres, Gehrke, & Yiu, 2002; Han, Pei, & Yan, 2004; Lin & Lee, 2004; Lin & Lee, 2005b; Roddick & Spiliopoulou, 2002). A typical example is a retail database where each record corresponds to a customer’s purchasing sequence, called data sequence. A data sequence is composed of all the customer’s transactions ordered by transaction time. Each transaction is represented by a set of literals indicating the set of items (called itemset) purchased in the transaction. The objective is to find all the frequent sub-sequences (called sequential patterns) in the sequence database. Whether a sub-sequence is frequent or not is determined by its frequency, named support, in the sequence database. An example sequential pattern might be that 40% customers bought PC and printer, followed by the purchase of scanner and graphics-software, and then digital camera. Such a pattern, denoted by <(PC, printer)(scanner, graphics-software)(digital camera)>, has three elements where each element is an itemset. Although the issue is motivated by the retail industry, the mining technique is applicable to domains bearing sequence characteristics, including the analysis of Web traversal patterns, medical treatments, natural disasters, DNA sequences, and so forth. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining (Pei, Han, & Wang, 2007; Chen & Yu, 2006; Garofalakis, Rastogi, & Shim, 2002; Lin & Lee, 2005a; Masseglia, Poncelet, & Teisseire, 2004). Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by a postprocessing on the result of sequential pattern mining without constraints. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, time-constrained sequential pattern mining (Lin & Lee, 2005a; Lin, Hsueh, & Chang, 2006; Masseglia, Poncelet, & Teisseire, 2004;) is more challenging, and more important in the aspect of temporal relationship discovery, than conventional pattern mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficiently Mining Closed Subsequences with Gap Constraints

Mining frequent subsequence patterns from sequence databases is a typical data mining problem and various efficient sequential pattern mining algorithms have been proposed. In many problem domains (e.g, biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this paper we re-examine the closed sequential patter...

متن کامل

Discovering Contiguous Sequential Patterns in Network-Constrained Movement

A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatiall...

متن کامل

Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks

This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue f...

متن کامل

Acquiring Background Knowledge for Intelligent Tutoring Systems

One of the unresolved problems faced in the construction of intelligent tutoring systems is the acquisition of background knowledge, either for the specification of the teaching strategy, or for the construction of the student model, identifying the deviations of students’ behavior. In this paper, we argue that the use of sequential pattern mining and constraint relaxations can be used to autom...

متن کامل

Distributed Sequential Pattern Mining: A Survey and Future Scope

Distributed sequential pattern mining is the data mining method to discover sequential patterns from large sequential database on distributed environment. It is used in many wide applications including web mining, customer shopping record, biomedical analysis, scientific research, etc. A large research has been done on sequential pattern mining on various distributed environments like Grid, Had...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009